{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Downloading Models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Polyglot requires a model for each task and language.\n",
    "These models are essential for the library to function.\n",
    "Given the large size of some of the models, we distribute the models through a download manager separately. The download manager has several modes of operation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modes of Operation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Command Line Mode\n",
    "\n",
    "The subcommand `download` takes a package or more as an argument and download the specified packages in the `polyglot_data` directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "usage: polyglot download [-h] [--dir DIR] [--quiet] [--force] [--exit-on-error] [--url SERVER_INDEX_URL] [packages [packages ...]]\r\n",
      "\r\n",
      "positional arguments:\r\n",
      "  packages              packages to be downloaded\r\n",
      "\r\n",
      "optional arguments:\r\n",
      "  -h, --help            show this help message and exit\r\n",
      "  --dir DIR             download package to directory DIR\r\n",
      "  --quiet               work quietly\r\n",
      "  --force               download even if already installed\r\n",
      "  --exit-on-error       exit if an error occurs\r\n",
      "  --url SERVER_INDEX_URL\r\n",
      "                        download server index url\r\n"
     ]
    }
   ],
   "source": [
    "!polyglot download --help"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[polyglot_data] Downloading package morph2.en to\r\n",
      "[polyglot_data]     /home/rmyeid/polyglot_data...\r\n",
      "[polyglot_data]   Package morph2.en is already up-to-date!\r\n"
     ]
    }
   ],
   "source": [
    "!polyglot download morph2.en"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Interactive Mode\n",
    "\n",
    "You can reach this mode by not supplying any arguments to the command line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Polyglot Downloader\r\n",
      "---------------------------------------------------------------------------\r\n",
      "  d) Download   l) List    u) Update   c) Config   h) Help   q) Quit\r\n",
      "---------------------------------------------------------------------------\r\n",
      "Downloader> "
     ]
    }
   ],
   "source": [
    "!polyglot download"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Library Interface"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from polyglot.downloader import downloader\n",
    "downloader.download(\"embeddings2.en\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Collections"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You noticed, by now, that we can install a specific model by specifying its name and the target language.\n",
    "\n",
    "Package name format is `task_name.language_code`\n",
    "\n",
    "#### Langauge Collections\n",
    "\n",
    "Packages are grouped by language. For example, if we want to download all the models that are specific to Arabic, the arabic collection of models name is **LANG:** followed by the language code of Arabic which is `ar`.\n",
    "\n",
    "Therefore, we can just run:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[polyglot_data] Downloading collection u'LANG:ar'\r\n",
      "[polyglot_data]    | \r\n",
      "[polyglot_data]    | Downloading package tsne2.ar to\r\n",
      "[polyglot_data]    |     /home/rmyeid/polyglot_data...\r\n",
      "[polyglot_data]    |   Package tsne2.ar is already up-to-date!\r\n",
      "[polyglot_data]    | Downloading package transliteration2.ar to\r\n",
      "[polyglot_data]    |     /home/rmyeid/polyglot_data...\r\n",
      "[polyglot_data]    |   Package transliteration2.ar is already up-to-\r\n",
      "[polyglot_data]    |       date!\r\n",
      "[polyglot_data]    | Downloading package morph2.ar to\r\n",
      "[polyglot_data]    |     /home/rmyeid/polyglot_data...\r\n",
      "[polyglot_data]    |   Package morph2.ar is already up-to-date!\r\n",
      "[polyglot_data]    | Downloading package counts2.ar to\r\n",
      "[polyglot_data]    |     /home/rmyeid/polyglot_data...\r\n",
      "[polyglot_data]    |   Package counts2.ar is already up-to-date!\r\n",
      "[polyglot_data]    | Downloading package sentiment2.ar to\r\n",
      "[polyglot_data]    |     /home/rmyeid/polyglot_data...\r\n",
      "[polyglot_data]    |   Package sentiment2.ar is already up-to-date!\r\n",
      "[polyglot_data]    | Downloading package embeddings2.ar to\r\n",
      "[polyglot_data]    |     /home/rmyeid/polyglot_data...\r\n",
      "[polyglot_data]    |   Package embeddings2.ar is already up-to-date!\r\n",
      "[polyglot_data]    | Downloading package ner2.ar to\r\n",
      "[polyglot_data]    |     /home/rmyeid/polyglot_data...\r\n",
      "[polyglot_data]    |   Package ner2.ar is already up-to-date!\r\n",
      "[polyglot_data]    | \r\n",
      "[polyglot_data]  Done downloading collection LANG:ar\r\n"
     ]
    }
   ],
   "source": [
    "!polyglot download LANG:ar"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Task Collections\n",
    "\n",
    "Packages are grouped by task. For example, if we want to download all the models that perform transliteration. The collection name is **TASK:** followed by the task name.\n",
    "\n",
    "Therefore, we can just run:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "downloader.download(\"TASK:transliteration2\", quiet=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Langauge & Task Support"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can query our download manager for which tasks are supported by polyglot, as the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[u'embeddings2',\n",
       " u'counts2',\n",
       " u'pos2',\n",
       " u'ner2',\n",
       " u'sentiment2',\n",
       " u'morph2',\n",
       " u'tsne2']"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "downloader.supported_tasks(lang=\"en\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can query our download manager for which languages are supported by polyglot named entity recognition subsystem, as the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  1. Polish                     2. Turkish                    3. Russian                  \n",
      "  4. Indonesian                 5. Czech                      6. Arabic                   \n",
      "  7. Korean                     8. Catalan; Valencian         9. Italian                  \n",
      " 10. Thai                      11. Romanian, Moldavian, ...  12. Tagalog                  \n",
      " 13. Danish                    14. Finnish                   15. German                   \n",
      " 16. Persian                   17. Dutch                     18. Chinese                  \n",
      " 19. French                    20. Portuguese                21. Slovak                   \n",
      " 22. Hebrew (modern)           23. Malay                     24. Slovene                  \n",
      " 25. Bulgarian                 26. Hindi                     27. Japanese                 \n",
      " 28. Hungarian                 29. Croatian                  30. Ukrainian                \n",
      " 31. Serbian                   32. Lithuanian                33. Norwegian                \n",
      " 34. Latvian                   35. Swedish                   36. English                  \n",
      " 37. Greek, Modern             38. Spanish; Castilian        39. Vietnamese               \n",
      " 40. Estonian                 \n"
     ]
    }
   ],
   "source": [
    "print(downloader.supported_languages_table(task=\"ner2\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can view all the available and/or installed collections or packages through the list function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using default data directory (/home/rmyeid/polyglot_data)\n",
      "=========================================\n",
      " Data server index for <polyglot-models>\n",
      "=========================================\n",
      "Collections:\n",
      "  [ ] LANG:af............. Afrikaans            packages and models\n",
      "  [ ] LANG:als............ als                  packages and models\n",
      "  [ ] LANG:am............. Amharic              packages and models\n",
      "  [ ] LANG:an............. Aragonese            packages and models\n",
      "  [ ] LANG:ar............. Arabic               packages and models\n",
      "  [ ] LANG:arz............ arz                  packages and models\n",
      "  [ ] LANG:as............. Assamese             packages and models\n",
      "  [ ] LANG:ast............ Asturian             packages and models\n",
      "  [ ] LANG:az............. Azerbaijani          packages and models\n",
      "  [ ] LANG:ba............. Bashkir              packages and models\n",
      "  [ ] LANG:bar............ bar                  packages and models\n",
      "  [ ] LANG:be............. Belarusian           packages and models\n",
      "  [ ] LANG:bg............. Bulgarian            packages and models\n",
      "  [ ] LANG:bn............. Bengali              packages and models\n",
      "  [ ] LANG:bo............. Tibetan              packages and models\n",
      "  [ ] LANG:bpy............ bpy                  packages and models\n",
      "  [ ] LANG:br............. Breton               packages and models\n",
      "  [ ] LANG:bs............. Bosnian              packages and models\n",
      "  [ ] LANG:ca............. Catalan              packages and models\n",
      "  [ ] LANG:ce............. Chechen              packages and models\n",
      "  [ ] LANG:ceb............ Cebuano              packages and models\n",
      "  [ ] LANG:cs............. Czech                packages and models\n",
      "  [ ] LANG:cv............. Chuvash              packages and models\n",
      "  [ ] LANG:cy............. Welsh                packages and models\n",
      "  [ ] LANG:da............. Danish               packages and models\n",
      "  [ ] LANG:de............. German               packages and models\n",
      "  [ ] LANG:diq............ diq                  packages and models\n",
      "  [ ] LANG:dv............. Divehi               packages and models\n",
      "  [ ] LANG:el............. Greek                packages and models\n",
      "  [P] LANG:en............. English              packages and models\n",
      "  [ ] LANG:eo............. Esperanto            packages and models\n",
      "  [ ] LANG:es............. Spanish              packages and models\n",
      "  [ ] LANG:et............. Estonian             packages and models\n",
      "  [ ] LANG:eu............. Basque               packages and models\n",
      "  [ ] LANG:fa............. Persian              packages and models\n",
      "  [ ] LANG:fi............. Finnish              packages and models\n",
      "  [ ] LANG:fo............. Faroese              packages and models\n",
      "  [ ] LANG:fr............. French               packages and models\n",
      "  [ ] LANG:fy............. Western Frisian      packages and models\n",
      "  [ ] LANG:ga............. Irish                packages and models\n",
      "  [ ] LANG:gan............ gan                  packages and models\n",
      "  [ ] LANG:gd............. Scottish Gaelic      packages and models\n",
      "  [ ] LANG:gl............. Galician             packages and models\n",
      "  [ ] LANG:gu............. Gujarati             packages and models\n",
      "  [ ] LANG:gv............. Manx                 packages and models\n",
      "  [ ] LANG:he............. Hebrew               packages and models\n",
      "  [ ] LANG:hi............. Hindi                packages and models\n",
      "  [ ] LANG:hif............ hif                  packages and models\n",
      "  [ ] LANG:hr............. Croatian             packages and models\n",
      "  [ ] LANG:hsb............ Upper Sorbian        packages and models\n",
      "  [ ] LANG:ht............. Haitian              packages and models\n",
      "  [ ] LANG:hu............. Hungarian            packages and models\n",
      "  [ ] LANG:hy............. Armenian             packages and models\n",
      "  [ ] LANG:ia............. Interlingua          packages and models\n",
      "  [ ] LANG:id............. Indonesian           packages and models\n",
      "  [ ] LANG:ilo............ Iloko                packages and models\n",
      "  [ ] LANG:io............. Ido                  packages and models\n",
      "  [ ] LANG:is............. Icelandic            packages and models\n",
      "  [ ] LANG:it............. Italian              packages and models\n",
      "  [ ] LANG:ja............. Japanese             packages and models\n",
      "  [ ] LANG:jv............. Javanese             packages and models\n",
      "  [ ] LANG:ka............. Georgian             packages and models\n",
      "  [ ] LANG:kk............. Kazakh               packages and models\n",
      "  [ ] LANG:km............. Khmer                packages and models\n",
      "  [ ] LANG:kn............. Kannada              packages and models\n",
      "  [ ] LANG:ko............. Korean               packages and models\n",
      "  [ ] LANG:ku............. Kurdish              packages and models\n",
      "  [ ] LANG:ky............. Kyrgyz               packages and models\n",
      "  [ ] LANG:la............. Latin                packages and models\n",
      "  [ ] LANG:lb............. Luxembourgish        packages and models\n",
      "  [ ] LANG:li............. Limburgish           packages and models\n",
      "  [ ] LANG:lmo............ lmo                  packages and models\n",
      "  [ ] LANG:lt............. Lithuanian           packages and models\n",
      "  [ ] LANG:lv............. Latvian              packages and models\n",
      "  [ ] LANG:mg............. Malagasy             packages and models\n",
      "  [ ] LANG:mk............. Macedonian           packages and models\n",
      "  [ ] LANG:ml............. Malayalam            packages and models\n",
      "  [ ] LANG:mn............. Mongolian            packages and models\n",
      "  [ ] LANG:mr............. Marathi              packages and models\n",
      "  [ ] LANG:ms............. Malay                packages and models\n",
      "  [ ] LANG:mt............. Maltese              packages and models\n",
      "  [ ] LANG:my............. Burmese              packages and models\n",
      "  [ ] LANG:ne............. Nepali               packages and models\n",
      "  [ ] LANG:nl............. Dutch                packages and models\n",
      "  [ ] LANG:nn............. Norwegian Nynorsk    packages and models\n",
      "  [ ] LANG:no............. Norwegian            packages and models\n",
      "  [ ] LANG:oc............. Occitan              packages and models\n",
      "  [ ] LANG:or............. Oriya                packages and models\n",
      "  [ ] LANG:os............. Ossetic              packages and models\n",
      "  [ ] LANG:pa............. Punjabi              packages and models\n",
      "  [ ] LANG:pam............ Pampanga             packages and models\n",
      "  [ ] LANG:pl............. Polish               packages and models\n",
      "  [ ] LANG:pms............ pms                  packages and models\n",
      "  [ ] LANG:ps............. Pashto               packages and models\n",
      "  [ ] LANG:pt............. Portuguese           packages and models\n",
      "  [ ] LANG:qu............. Quechua              packages and models\n",
      "  [ ] LANG:rm............. Romansh              packages and models\n",
      "  [ ] LANG:ro............. Romanian             packages and models\n",
      "  [ ] LANG:ru............. Russian              packages and models\n",
      "  [ ] LANG:sa............. Sanskrit             packages and models\n",
      "  [ ] LANG:sah............ Sakha                packages and models\n",
      "  [ ] LANG:scn............ Sicilian             packages and models\n",
      "  [ ] LANG:sco............ Scots                packages and models\n",
      "  [ ] LANG:se............. Northern Sami        packages and models\n",
      "  [ ] LANG:sh............. Serbo-Croatian       packages and models\n",
      "  [ ] LANG:si............. Sinhala              packages and models\n",
      "  [ ] LANG:sk............. Slovak               packages and models\n",
      "  [ ] LANG:sl............. Slovenian            packages and models\n",
      "  [ ] LANG:sq............. Albanian             packages and models\n",
      "  [ ] LANG:sr............. Serbian              packages and models\n",
      "  [ ] LANG:su............. Sundanese            packages and models\n",
      "  [ ] LANG:sv............. Swedish              packages and models\n",
      "  [ ] LANG:sw............. Swahili              packages and models\n",
      "  [ ] LANG:szl............ szl                  packages and models\n",
      "  [ ] LANG:ta............. Tamil                packages and models\n",
      "  [ ] LANG:te............. Telugu               packages and models\n",
      "  [ ] LANG:tg............. Tajik                packages and models\n",
      "  [ ] LANG:th............. Thai                 packages and models\n",
      "  [ ] LANG:tk............. Turkmen              packages and models\n",
      "  [ ] LANG:tl............. Tagalog              packages and models\n",
      "  [ ] LANG:tr............. Turkish              packages and models\n",
      "  [ ] LANG:tt............. Tatar                packages and models\n",
      "  [ ] LANG:ug............. Uyghur               packages and models\n",
      "  [ ] LANG:uk............. Ukrainian            packages and models\n",
      "  [ ] LANG:ur............. Urdu                 packages and models\n",
      "  [ ] LANG:uz............. Uzbek                packages and models\n",
      "  [ ] LANG:vec............ vec                  packages and models\n",
      "  [ ] LANG:vi............. Vietnamese           packages and models\n",
      "  [ ] LANG:vls............ vls                  packages and models\n",
      "  [ ] LANG:vo............. Volapük              packages and models\n",
      "  [ ] LANG:wa............. Walloon              packages and models\n",
      "  [ ] LANG:war............ Waray                packages and models\n",
      "  [ ] LANG:yi............. Yiddish              packages and models\n",
      "  [ ] LANG:yo............. Yoruba               packages and models\n",
      "  [ ] LANG:zh............. Chinese              packages and models\n",
      "  [ ] LANG:zhc............ Chinese Character    packages and models\n",
      "  [ ] LANG:zhw............ zhw                  packages and models\n",
      "  [ ] TASK:counts2........ counts2\n",
      "  [ ] TASK:embeddings2.... embeddings2\n",
      "  [ ] TASK:ner2........... ner2\n",
      "  [P] TASK:sentiment2..... sentiment2\n",
      "  [ ] TASK:tsne2.......... tsne2\n",
      "\n",
      "([*] marks installed packages; [P] marks partially installed collections)\n"
     ]
    }
   ],
   "source": [
    "downloader.list(show_packages=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
